
26
The hash function described in Figure 2.5 is not necessarily
the best with respect to table distribution, but does have the merit of
extreme simplicity (and speed if overflows are allowed). If the keys
are very long, the hash function will take too long to compute.
Furthermore, the early characters will wind up being left-shifted out
of the eventual answer. A common practice in this case is not to
use all the characters. The length and properties of the keys would
then influence the choice. For instance, the keys could be a
complete street address. The hash function might include a couple
of characters from the street address and perhaps a couple of
characters from the city name and ZIP code. Some programmers
implement their hash function by using only the characters in the
odd spaces, with the idea that the time saved computing the hash
function will make up for a slightly less evenly distributed function.
The main programming detail left is collision resolution. If,
when inserting an element, it hashes to the same value as an
already inserted element, then we have a collision and need to
resolve it. There are several methods for dealing with this. We will
discuss two of the simplest: open hashing and closed hashing.*
*These are also commonly known as separate chaining and open
addressing, respectively.
2.3. OPEN HASHING (SEPARATE CHAINING)
The first strategy, commonly known as either open hashing,
or separate chaining, is to keep a list of all elements that hash to
the same value. We assume that the keys are the first 10 perfect
squares and that the hashing function is simply hash(x) = x mod 10.
(The table size is not prime, but is used here for simplicity.) Figure
2.6 should make this clear.
To perform a find, we use the hash function to determine
which list to traverse. We then traverse this list in the normal
manner, returning the position where the item is found. To perform
an insert, we traverse down the appropriate list to check whether
the element is already in place (if duplicates are expected, an extra
field is usually kept, and this field would be incremented in the
event of a match). If the element turns out to be new, it is inserted
either at the front of the list or at the end of the list, whichever is
easiest. This is an issue most easily addressed while the code is
being written. Sometimes new elements are inserted at the front of
the list, since it is convenient and also because frequently it
happens that recently inserted elements are the most likely to be
accessed in the near future.